1 Why are we here?

A Non Reproducible workflow

2 Introducing RMarkdown

That’s why we are playing around with RMarkdown today.
Clearly, there’s no best way of doing so than throwing in a bunch of Cats.

2.1 Why RMarkdown

RMarkdown is the easiest way to create interactive documents integrating text, code and output from your code. It is fairly versatile, has a shallow learning curve and, as it often happens in R, there is a bunch of people continuously expanding its functionalities and possibilities.

For instance, you can:

  • Compile a single R Markdown document to a report in different formats, such as PDF, HTML, or Word.
  • Create notebooks in which you can directly run code chunks interactively.
  • Make slides for presentations (HTML5, LaTeX Beamer, or PowerPoint).
  • Produce dashboards with flexible, interactive, and attractive layouts.
  • Build interactive applications based on Shiny.
  • Write journal articles.
  • Author books of multiple chapters.
  • Generate websites and blogs.

Yes, you guessed that. This very document has been generated using RMarkdown.

Not all that glitters is gold, though. Expect headaches when setting up your (especially Windows) machine to work if interested in knitting pdfs or more advanced stuff.

2.2 Briefest history on RMarkdown ever

Source: https://bookdown.org/yihui/rmarkdown/
The document format “R Markdown” was first introduced in the knitr package (Xie 2015, 2020c) in early 2012. The idea was to embed code chunks (of R or other languages) in Markdown documents.

However, the original version of Markdown invented by John Gruber was often found overly simple and not suitable to write highly technical documents. For example, there was no syntax for tables, footnotes, math expressions, or citations. Fortunately, John MacFarlane created a wonderful package named Pandoc (http://pandoc.org) to convert Markdown documents (and many other types of documents) to a large variety of output formats. More importantly, the Markdown syntax was significantly enriched. Now we can write more types of elements with Markdown while still enjoying its simplicity.

In a nutshell, R Markdown stands on the shoulders of knitr and Pandoc. The former executes the computer code embedded in Markdown, and converts R Markdown to Markdown. The latter renders Markdown to the output format you want (such as PDF, HTML, Word, and so on).

The rmarkdown package (Allaire, Xie, McPherson, et al. 2020) was first created in early 2014.

2.3 What’s Markdown

Markdown is a lightweight markup language that you can use to add formatting elements to plaintext text documents. Created by John Gruber in 2004, Markdown is now one of the world’s most popular markup languages.

It’s the language used to create README’s and Project descriptions in GitHub

2.4 What’s Latex

Latex which is pronounced «Lah-tech» or «Lay-tech» (to rhyme with «blech» or «Bertolt Brecht»), is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents but it can be used for almost any form of publishing.

LaTeX is not a word processor! Instead, LaTeX encourages authors not to worry too much about the appearance of their documents but to concentrate on getting the right content.

You need LaTex only for knitting to pdf documents. If you’re happy with html notebooks, there’s no need of installing it.

2.5 What’s Pandoc

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

The good news is that your RStudio IDE has already a Pandoc installation embedded! Maybe it’s not the latest, but it will work just fine most of the times.

3 Can we look at cats now?

Almost. Fire up your RStudio and install RMarkdown first.

install.packages("rmarkdown")

Time to start your first RMarkdown document.

File -> New File -> R Markdown

We’ll stick to html today. Knitting documents to pdf is undoubtedly cooler, but requires installing Latex, which might be tricky, depending on the machine one is using.

Congratulations - You’ve just created your first RMarkdown document.

3.1 What am I seeing?

You should be seeing something like this: The first part is called the metadata.
The metadata is written between the pair of three dashes — The syntax for the metadata is YAML (YAML Ain’t Markup Language, https://en.wikipedia.org/wiki/YAML), so sometimes it is also called the YAML metadata or the YAML frontmatter. Before it bites you hard, we want to warn you in advance that indentation matters in YAML, so do not forget to indent the sub-fields of a top field properly.
(Source - https://bookdown.org/yihui/rmarkdown/basics.html).
It sounds intimidating. However, you won’t have to do much with your metadata most of the time, besides copy pasting it from a template. Pheeeewww!

In the rest of the script you can distinguish between text and chunks of code. Note how chunks of code are enclosed by three backtick signs.
```

If you can’t find the backtick sign in your keyboard, try with the ASCI code: Alt+96.

Look at the two buttons highlighted by the arrows.

  • Insert — allows you to add a new chunk of code.
  • Knit — renders your document to the preferred file type.

3.2 What happens when I knit my notebook?

Knitting your RMarkdown script means rendering it to your chosen output (html in this case). There is quite a lot of machinery (=dark magic) happening behind the scenes. Fortunately, for most applications we don’t have to understand how these happen. Just enjoy the result.

3.3 How to create a beautiful RMarkdown document?

3.3.1 Playing around with Text

Let’s start with the easiest. How to add text (=narrative), to my RMarkdown. Easy-peasy. Just type!

You just need to know a couple of things:
Going to new line -> Need to add two whitespaces at the end of a block of text.
Leaving an empty line -> Two ways: 1. <br>, 2.\newline (followed by an empty line).

Titles and Headers:
# Header - Header 1
## Header - Header 2
### Header - Header 3

Basic formatting. You can use some basic markdown formatting to make your text:
Italic: *Felis catus* –> Felis catus
Italic: _Felis catus_ –> Felis catus
Bold: **Felis catus** –> Felis catus
Bold: __Felis catus__ –> Felis catus
Both: ***Felis catus*** –> Felis catus
Both: ___Felis catus___ –> Felis catus

Adding Lynx:
[Eurasian Lynx - Wikipedia](https://en.wikipedia.org/wiki/Eurasian_lynx)
Renders as:
Eurasian Lynx - Wikipedia

Embed a cat image from url/file:
<center> ![Black Footed Cat](https://en.wikipedia.org/wiki/Black-footed_cat#/media/File:Black_Footed_Cat.jpg){height=300px} </center>

Renders as:

Black Footed Cat

Create numbered lists of cats (don’t forget to leave an empty line before starting the list). E.g.,
My favourite wildcats:

1. Andean Cat (*Leopardus Jacobita*)
2. Rusty Spotted Cat (*Prionailurus Rubiginosus*)
3. Chinese Mountain Cat (*Felis Bieti*)
4. Kodkod (*Leopardus Guigna*)

Renders as:

  1. Andean Cat (Leopardus Jacobita)
  2. Rusty Spotted Cat (Prionailurus Rubiginosus)
  3. Chinese Mountain Cat (Felis Bieti)
  4. Kodkod (Leopardus Guigna)

Create unordered lists of cats:

Where small cats live:

* Small Cats of South America
  + Andean Cat
  + Geoffroy’s Cat
  + Jaguarundi
  + ...
* Small Cats of SE Asia
  + Leopard Cat
  + Marbled Cat
  + Fishing Cat
  + ...

Renders as:

  • Small Cats of South America
    • Andean Cat
    • Geoffroy’s Cat
    • Jaguarundi
  • Small Cats of SE Asia
    • Leopard Cat
    • Marbled Cat
    • Fishing Cat

Do Yourself a Favour - and download the catsheet (=cheatsheet)
Catsheet - https://rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

3.3.2 Playing around with Code

To insert a chunk of code, just enclose it between three backticks, followed by {r}:
```{r}

4+4

```

It will render as:

4+4
## [1] 8

Let’s make some practice. We need to import some cat-related data, first.

big.cats <- read.table("data/Wikipedia_LargestCats.txt", header = T, sep="\t")
big.cats
##    Rank           Common.name   Scientific.name Weight.range.kg
## 1     1                 Tiger   Panthera tigris          90-300
## 2     2                  Lion      Panthera leo         160-270
## 3     3                Jaguar     Panthera onca          56-120
## 4     4                Cougar     Puma concolor          53-100
## 5     5               Leopard   Panthera pardus           17-90
## 6     6               Cheetah  Acinonyx jubatus           20-60
## 7     7          Snow leopard    Panthera uncia           22-55
## 8     8         Eurasian lynx         Lynx lynx           15-45
## 9     9 Sunda clouded leopard   Neofelis diardi           12-26
## 10   10       Clouded leopard Neofelis nebulosa         11.5-23
##    Maximum.weight.kg Maximum.length.m Native.range.by.continent
## 1             388.78             4.17                      Asia
## 2             375.00             3.64      Asia, Africa, Europe
## 3             160.00             2.60   North and South America
## 4             125.20             2.80   North and South America
## 5              96.50             2.75      Asia, Africa, Europe
## 6              72.00             2.10      Africa, Asia, Europe
## 7              75.00             2.50                      Asia
## 8              38.00             1.50              Asia, Europe
## 9              27.00             1.30                      Asia
## 10             23.00             1.08                      Asia

This data.frame contains the weight range, and the maximum observed weights and lengths of the ten largest wildcats. (Source: Wikipedia).

We then load tidyverse a set of powerful packages for data manipulation and visualization.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

When loading tidyverse, we are getting a bunch of warning messages. Not so nice in a report. You can deactivated them by opening your chunk with {r, warning=F, message=F}.

Let’s do some real R code. In the chunk below, we split the weight range field into min and max using some fancy dplyr code, from tidyverse.

big.cats <- big.cats %>% 
  separate(Weight.range.kg, into=c("Weight.min", "Weight.max"), sep = "-", remove = T) %>%
  mutate(Weight.min=as.numeric(Weight.min), 
         Weight.max=as.numeric(Weight.max)) %>% 
  mutate(Common.name=factor(Common.name, levels=big.cats$Common.name))

Maybe in your report you want to automatically include a value which you calculate in your r script. This can be done with some inline code. This can be done enclosing some code by backticks and specifying the code is in r. For instance the code:
The cat having the highest weight is `r big.cats[1,'Common.name']`. It weights up to `r big.cats[1,'Maximum.weight.kg']`.

Renders as:
The cat having the highest weight is Tiger.
It weights up to 388.78.

A couple of useful arguments when you start your chunk:
{r echo=F} — For running your code in the background, without showing the code itself.
{r eval=F} — Opposite. For showing your code, but without actually running it.

3.3.3 Playing around with Pictures

THE THING about RMarkdown is that it allows embedding graphs directly to your document. Input changed? Just re-knit and you’ll have all your graphs updated. It’s as easy as simply running a chunk of code.

Let’s make a graph to see the differences in cat size more easily, through a forest plot. We install the fantastically useless package cat first. It’s not on CRAN, therefore we need also the package remotes. All it does is randomly selecting a cat image to be used as background of our ggplot graphs.

install.packages("remotes")
remotes::install_github("hilaryparker/cats") 

We can then load the package and use it to get our much needed random cat image

library(cats)
ggcats <- ggplot(data=big.cats) + 
  cats::add_cat() + ## add a random cat image on the background of the graph, if you fancy
  geom_segment(aes(y=Common.name, yend=Common.name, x=Weight.min, xend=Weight.max), 
               arrow = arrow(length = unit(5, "points"), 
                             ends="both", type = "closed", angle = 40)) + 
  ylab(NULL) +
  xlab(NULL) + 
  theme(axis.text = element_text(size=14))
ggcats

Some useful figure related arguments here:
{r fig.height=3} - picture height in inches
{r fig.width=3} - picture width in inches
{r fig.cap="add caption here"} - Add a caption
{r fig.align="center"} - Horizontal alignment of your graph
{r dpi=150} - Change resolution of output image (mostly relevant for pdf)

You can also combine them altogether in a single line. For instance, if I rerun the chunk above specifying:
{r, fig.height=3, fig.width=4, fig.cap="Weight (kg) of the 10 largest wild cats", fig.align="center", dpi=150, echo=F}
Weight (kg) of the 10 largest wild cats

Weight (kg) of the 10 largest wild cats

Note how the code isn’t visible anymore, having set echo=F. I swear it’s there, though.

3.3.4 Playing around with Tables

If you just print an R object on the console as we did above, RMarkdown, will show it in the rendered document as well. It won’t look that good, though.

The entry level way of rendering tables is the knitr::kable function.

knitr::kable(head(big.cats[1:4,1:5]), caption="The largest cats!")
The largest cats!
Rank Common.name Scientific.name Weight.min Weight.max
1 Tiger Panthera tigris 90 300
2 Lion Panthera leo 160 270
3 Jaguar Panthera onca 56 120
4 Cougar Puma concolor 53 100

An even nicer way is using the kableExtra package. For instance, when rendering to html, kableExtra allows the creation of responsive tables. This is extremely useful for tables larger than a A4 format.

library(kableExtra)
knitr::kable(big.cats, caption="The largest cats!") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
    position = "center")
The largest cats!
Rank Common.name Scientific.name Weight.min Weight.max Maximum.weight.kg Maximum.length.m Native.range.by.continent
1 Tiger Panthera tigris 90.0 300 388.78 4.17 Asia
2 Lion Panthera leo 160.0 270 375.00 3.64 Asia, Africa, Europe
3 Jaguar Panthera onca 56.0 120 160.00 2.60 North and South America
4 Cougar Puma concolor 53.0 100 125.20 2.80 North and South America
5 Leopard Panthera pardus 17.0 90 96.50 2.75 Asia, Africa, Europe
6 Cheetah Acinonyx jubatus 20.0 60 72.00 2.10 Africa, Asia, Europe
7 Snow leopard Panthera uncia 22.0 55 75.00 2.50 Asia
8 Eurasian lynx Lynx lynx 15.0 45 38.00 1.50 Asia, Europe
9 Sunda clouded leopard Neofelis diardi 12.0 26 27.00 1.30 Asia
10 Clouded leopard Neofelis nebulosa 11.5 23 23.00 1.08 Asia

I won’t look good that good on a pdf, but on an html report it’s just as good as it can possibly be.

3.3.5 Playing around with References

Yes, it is possible to add references to your markdown document. There are multiple ways for doing it. You can even link your Zotero library, if you wish. However, the easiest is probably to use the knitcitations package. We load it first.

install.packages("knitcitations")
library(knitcitations)

We can now cite online any work simply by referring to its doi. How?

The text:

Cats are not necessarily animals `r citep('10.1007/s10670-022-00588-w')`. But if they are, they should be left free to roam `r citep('10.1007/s12136-019-00408-x')`

will render as:


Cats are not necessarily animals (Hermida, 2022).But if they are, they should be left free to roam (Abbate, 2019)


To create a bibliography, we need to use the respective command.

bibliography()
## [1] C. Abbate. "A Defense of Free-Roaming Cats from a Hedonist Account
## of Feline Well-being". In: _Acta Analytica_ 35.3 (ott. 2019), pp.
## 439-461. DOI: 10.1007/s12136-019-00408-x.
## <https://doi.org/10.1007/s12136-019-00408-x>.
## 
## [2] M. Hermida. "Cats are not necessarily animals". In: _Erkenntnis_
## (ago. 2022). DOI: 10.1007/s10670-022-00588-w.
## <https://doi.org/10.1007/s10670-022-00588-w>.

…and yes, you can also change the format style, but it is a bit laboursome and we don’t deal with this aspect here.

4 Slightly more advanced stuff

4.1 Long computing times

Whenever you knit your RMarkdown report, R will re-run all the code contained in your .Rmd script. It’s therefore not the best idea to include a chunk of code taking 3 hours to run. If you do so, even just correcting a typo in the text will require you to wait three hours before the corrected version of your report is rendered!

There are a couple of workarounds, though.

  1. add the argument {r cache=T} to your chunk:
    Adding this argument to the slowest chunks of code will save the intermediate results of these chunks in a dedicated folder. RMarkdown will only rerun these chunks when changed. Otherwise, it will skip the chunk, and get directly the cached results.

  2. {r eval=F} + save and load
    Sometimes, it might be more convenient to run slow chunks of code in a dedicated, interactive session (maybe even on a different machine), save the results in a .RData file, and reload this saved data in your report. For transparency, you might still show the code you used to produce these intermediate outputs, but setting eval=F, you’ll tell RMarkdown not to run this code.

Something like this.
Chunk 1 with slow code I ran elsewhere:

```{r eval=F}

output <- function(input) # A real SLOW chunk of code

save(output, filename='output.slow.RData')

```

Chunk 2 reimporting the output of chunk 1:

```{r}

load(filename='output.slow.RData')

```

4.2 Alternative ways of knitting

Clicking on the knit button is convenient. The keyboard shortcut Ctrl``Alt``k is even more.
Sometimes, you might want to knit multiple documents (Yes, you might loop across several .Rmd scripts, and knit hundreds of reports in parallel). To do so, you might want to knit an .Rmd file from console as:

knitr::knit("KateRMarkdown.Rmd")

4.3 PDFs and paper templates

I know. You you can’t wait to produce your fantastic pdf reports, and write your next paper directly in R.
Good news - is possible - There are many templates out there, and you just have to fetch them. See for instance: https://t.co/uJBqWER5h6?amp=1
Yes, there are templates which will also format bibliographic reference as requested by different journals

Bad news - you’ll need to setup your machine first, and it might be tricky sometimes. You find some guidance at: https://bookdown.org/yihui/rmarkdown-cookbook/install-latex.html.

5 Resources

Many good resources out there. I only cite two:

  1. Bookdown - https://bookdown.org/yihui/rmarkdown/
  2. RMarkdown Cookbook - https://bookdown.org/yihui/rmarkdown-cookbook/

6 Your Turn!

Now it’s up to you to create a beautiful RMarkdown report full of cats. The more \ the cuter the cats, the better.

Pick up a cute, wild species of cat and:

  1. Set up a new RMarkdown project
  2. Report a fun fact about your cat species as text in your report
  3. Embed a (copyright free) picture of that cat from the internet
  4. Download some data on that cat from gbif using the rgbif package (help code below)
  5. Show the data in a table
  6. Create one graph based on that data (for instance a bar chart of the species occurrences across countries or sampling years, or a map of the coordinates). Any graphs work
  7. Knit your project to html
  8. Upload your html report at: https://portal.idiv.de/nextcloud/index.php/s/ZK7XWMoAdXFbffQ. I will publish the nicest reports in the KateRMarkdown gallery.

Help code to download data from gbif

library(tidyverse)
library(rgbif)
myspecies <- "Caracal caracal"  ## example
get.speciesKey <- function(x){name_backbone(x)$speciesKey} #get GBIF species key
key <- get.speciesKey(myspecies)
# extract the first n occurrences from rgbif
get.occurrences <- function(x, n=100){occ_search(taxonKey=x, return="data", 
                                             limit=n, hasCoordinate = T)}
# clean data
dat <- lapply(key, get.occurrences, n=100)[[1]]
dat <- dat$data  %>%  
  dplyr::select(species, year:day, country, stateProvince, 
                decimalLongitude, decimalLatitude)#, everything())

7 sessionInfo()

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22000)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Italian_Italy.utf8  LC_CTYPE=Italian_Italy.utf8   
## [3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=Italian_Italy.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] knitcitations_1.0.12 kableExtra_1.3.4     cats_0.1            
##  [4] forcats_0.5.2        stringr_1.5.0        dplyr_1.0.10        
##  [7] purrr_0.3.5          readr_2.1.3          tidyr_1.2.1         
## [10] tibble_3.1.8         ggplot2_3.4.0        tidyverse_1.3.2     
## [13] cowplot_1.1.1        hexSticker_0.4.9    
## 
## loaded via a namespace (and not attached):
##  [1] fs_1.5.2            lubridate_1.9.0     webshot_0.5.4      
##  [4] httr_1.4.4          tools_4.2.2         backports_1.4.1    
##  [7] bslib_0.4.1         utf8_1.2.2          R6_2.5.1           
## [10] DBI_1.1.3           colorspace_2.0-3    withr_2.5.0        
## [13] tidyselect_1.2.0    curl_4.3.3          compiler_4.2.2     
## [16] textshaping_0.3.6   cli_3.4.1           rvest_1.0.3        
## [19] xml2_1.3.3          labeling_0.4.2      sass_0.4.4         
## [22] scales_1.2.1        hexbin_1.28.3       systemfonts_1.0.4  
## [25] digest_0.6.31       yulab.utils_0.0.6   svglite_2.1.1      
## [28] rmarkdown_2.18      jpeg_0.1-10         pkgconfig_2.0.3    
## [31] htmltools_0.5.4     showtext_0.9-5      bibtex_0.5.1       
## [34] dbplyr_2.2.1        fastmap_1.1.0       highr_0.9          
## [37] rlang_1.0.6         readxl_1.4.1        rstudioapi_0.14    
## [40] sysfonts_0.8.8      gridGraphics_0.5-1  jquerylib_0.1.4    
## [43] farver_2.1.1        generics_0.1.3      jsonlite_1.8.4     
## [46] googlesheets4_1.0.1 magrittr_2.0.3      ggplotify_0.1.0    
## [49] Rcpp_1.0.9          munsell_0.5.0       fansi_1.0.3        
## [52] RefManageR_1.4.0    lifecycle_1.0.3     stringi_1.7.8      
## [55] yaml_2.3.6          plyr_1.8.8          grid_4.2.2         
## [58] crayon_1.5.2        lattice_0.20-45     haven_2.5.1        
## [61] hms_1.1.2           magick_2.7.4        knitr_1.41         
## [64] pillar_1.8.1        reprex_2.0.2        glue_1.6.2         
## [67] evaluate_0.19       ggimage_0.3.1       ggfun_0.0.9        
## [70] modelr_0.1.10       vctrs_0.5.1         tzdb_0.3.0         
## [73] cellranger_1.1.0    gtable_0.3.1        assertthat_0.2.1   
## [76] cachem_1.0.6        xfun_0.35           broom_1.0.1        
## [79] viridisLite_0.4.1   ragg_1.2.4          googledrive_2.0.0  
## [82] gargle_1.2.1        showtextdb_3.0      timechange_0.1.1   
## [85] ellipsis_0.3.2